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Abstract 

This paper deals with the nonparametric density estimation of the regression error term assuming its in- 
dependence with the covariate. The difference between the feasible estimator which uses the estimated 
residuals and the unfeasible one using the true residuals is studied. An optimal choice of the bandwidth used 
to estimate the residuals is given. We also study the asymptotic normality of the feasible kernel estimator 
and its rate-optimality. 
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1 Introduction 

Consider a sample (X,Y), (Xi,Yi), . . . , (X n ,Y n ) of independent and identically distributed (i.i.d) random 
variables, where Y is the univariate dependent variable and the covariate X is of dimension d. Let m(-) be 
the conditional expectation of Y given X and let e be the related regression error term, so that the regression 
error model is 

Yi = m(Xi) + E i: i = l,...,n. (1.1) 

We wish to estimate the probability distribution function (p.d.f) of the regression error term, /(•), using the 
nonparametric residuals. Our potential applications are as follows. First, an estimation of the p.d.f of e is an 
important tool for understanding the residuals behavior and therefore the fit of the regression model (11.11) . 
This estimation of /(•) can be used for goodness-of-fit tests of a specified error distribution in a parametric 
regression setting. Some examples can be founded in Loynes (1980), Akritas and Van Keilegom (2001), Cheng 
and Sun (2008). The estimation of the density of the regression error term can also be useful for testing the 
symmetry of the residuals. See Ahmad et Li (1997), Dette et al. (2002). Another interest of the estimation 
of / is that it can be used for constructing nonparametric estimators for the density and hazard function of 
Y given X, as related in Van Keilegom and Veraverbeke (2002). This estimation of / is also important when 
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are interested in the estimation of the p.d.f of the response variable Y. See Escanciano and Jacho-Chavez 
(2010). Note also that an estimation of the p.d.f of the regression errors can be useful for proposing a mode 
forecast of Y given X = x. This mode forecast is based on an estimation of m(x) + argmin £SR /(e). 

Relatively little is known about the nonparametric estimation of the p.d.f and the cumulative distribu- 
tion function (c.d.f) of the regression error. Up to few exceptions, the nonparametric literature focuses on 
studying the distribution of Y given X. See Roussas (1967, 1991), Youndje (1996) and references therein. 
Akritas and Van Keilegom (2001) estimate the cumulative distribution function of the regression error in 
heteroscedastic model. The estimator proposed by these authors is based on a nonparametric estimation 
of the residuals. Their result show the impact of the estimation of the residuals on the limit distribution 
of the underlying estimator of the cumulative distribution function. Miiller, Schick and Wefelmeyer (2004) 
consider the estimation of moments of the regression error. Quite surprisingly, under appropriate conditions, 
the estimator based on the true errors is less efficient than the estimator which uses the nonparametric 
estimated residuals. The reason is that the latter estimator better uses the fact that the regression error e 
has mean zero. Efromovich (2005) consider adaptive estimation of the p.d.f of the regression error. He gives 
a nonparametric estimator based on the estimated residuals, for which the Mean Integrated Squared Error 
(MISE) attains the minimax rate. Fu and Yang (2008) study the asymptotic normality of the estimators 
of the regression error p.d.f in nonlinear autoregressive models. Cheng (2005) establishes the asymptotic 
normality of an estimator of /(•) based on the estimated residuals. This estimator is constructed by splitting 
the sample into two parts: the first part is used for the construction of estimator of /(•), while the second 
part of the sample is used for the estimation of the residuals. 

The focus of this paper is to estimate the p.d.f of the regression error using the estimated residuals, 
under the assumption that the covariate X and the regression error e are independent. In a such setup, it 
would be unwise to use a conditional approach based on the fact that /(e) = f(e\x) = tp (m(x) + e\x), where 
<fi(-\x) is the p.d.f of Y given X — x. Indeed, the estimation of m(-) and y{-\x) are affected by the curse of 
dimensionality, so that the resulting estimator of /(•) would have considerably a slow rate of convergence if 
the dimension of X is high. The approach proposed here uses a two-steps procedure which, in a first step, 
replaces the unobserved regression error terms by some nonparametric estimator In a second step, the 
estimated £j's are used to estimate nonparametrically /(•), as if they were the true e^'s. If proceeding so 
can circumvent the curse of dimensionality, a challenging issue is to evaluate the impact of the estimated 
residuals on the final estimator of /(•). Hence one of the contributions of our study is to analyze the effect 
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of the estimation of the residuals on the regression errors p.d.f. Kernel estimators. Next, an optimal choice 
of the bandwidth used to estimate the residuals is given. Finally, we study the asymptotic normality of the 
feasible Kernel estimator and its rate-optimality. 

The rest of this paper is organized as follows. Section 2 presents ours estimators and proposes an 
asymptotic normality of the (naive) conditional estimator of the density of the regression error term. Sections 
3 and 4 group our assumptions and main results. The conclusion of this chapter is given in Section 5, while 
the proofs of our results are gathered in section 6 and in an appendix. 

2 Some nonparametric conditional estimator of the density of the 
regression error 

To illustrate the potential impact of the dimension d of the X^s, let us first consider a naive conditional 
estimator of the p.d.f /(•) of the regression error term e. Let <p(-\x) and f(-\x) be respectively the p.d.f. of 
Y and e given X = x. Since f(e\x) = tp(m(x) + e\x), using the independence of X and e gives 



Consider some Kernel functions Kq, K\ and some bandwidths bo, ho and h±. The expression (|2.1|) of / 
suggests to use the Kernel nonparametric estimator 



/( e ) = f( e \ x ) = <p ( m ( x ) + A x ) ■ 



(2.1) 




where fh n {x) is the Nadaraya- Watson (1964) estimator of m(x) defined as 




(2.2) 



The first result presented in this chapter is the following proposition. 



proposition 2.1. Define 




and suppose that ho decrease to such that n/iQ d /lnn — > oo, ln(l//io)/ln(lnn) — > oo and 
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when n — > oo. Then under Assumptions (Ai) — (Aio) given in the next section, we have 

\fnhlhx (f n ( e \x) -J n (e\x)) AN (o, j J K*(z)Kftv)dzdv 

where g(-) is the marginal density of X and 

7„ (e|l) . /(£|l) + ^f) + MgM +oK+ft;) . 



This results suggests that an optimal choice of the bandwidths ho and hi should achieve the minimum of 
the asymptotic mean square expansion first order terms 



AMSE (f n (e\x) 



h§/ii(x,e) hlfi 2 (x,e) 



f{e\x)jKl(z)dzjKl{v)dv 
nh^hig(x) 



2g(x) 2g(x) 

Elementary calculations yield that the resulting optimal bandwidths ho and hi are all proportional to 
n -i/(,d+5) > leading to the exact consistency rate rt~ 2 /( d+5 ) for f n (x\e). In the case d = 1, this rate is n -1 / 3 , 
which is worst than the rate n~ 2 ^ 5 achieved by the optimal Kernel estimator of an univariate density. See 
Bosq and Lecoutre (1987), Scott (1992), Wand and Jones (1995). Note also that the exponent 2/(d + 5) 
decreases to with the dimension d. This indicates a negative impact of the dimension d on the performance 
of the estimator, the so-called curse of dimensionality. The fact that f n (e\x) is affected by the curse of 
dimensionality is a consequence of conditioning. Indeed, (|2.1[) identifies the unconditional /(e) with the 
conditional distribution of the regression error given the covariate. 

To avoid this curse of dimensionality in the nonparametric kernel estimation of /(e), our approach 
proposed here builds, in a first step, the estimated residuals 

Si = Yi-fhin, i = l,...,n, (2.3) 
where fhi n = rhi n {Xi) is a leave-one out version of the Kernel regression estimator (|2.2I) . 



Sfe Ko (^) 



(2.4) 



It is tempting to use, in a second step, the estimated Si as if they were the true residuals £j. This would 
ignore that the m„(Xj)'s can deliver severely biased estimations of the m(Xi)'s for those Xi which are close 
to the boundaries of the support X of the covariate distribution. To that aim, our proposed estimator trims 
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the observations X, outside an inner subset X of X 



1 



n 



) 



/m(e) 



Si - e 



(2.5) 



i=l 



&1 



This estimator is the so-called two-steps Kernel estimator of /(e). In principle, it would be possible to assume 
that X grows to X with a negligible rate compared to the bandwidth b\ . This would give an estimator close 
to the more natural Kernel estimator Y^7=i ^ (0^£ — e )/^i) / ( n bi)- However, in the rest of the paper, a fixed 
subset Xq will be considered for the sake of simplicity. 

Observe that the two steps Kernel estimator fi n (c) is a feasible estimator in the sense that it does not 
depend on any unknown quantity, as desirable in practice. This contrasts with the unfeasible ideal Kernel 
estimator 



which depends in particular on the unknown regression error terms. It is however intuitively clear that /i«(e) 
and fi n (e) should be closed, as illustrated by the results of the next section. 

3 Assumptions 

The following assumptions are used in our mains results. 

(Ai) The support X of X is a compact subset ofM. d and Xq is an inner closed subset of X with non empty 



(A 2 ) the p.d.f. g(-) of the i.i.d. covariates X,Xi is strictly positive over X , and has continuous second 
order partial derivatives over X, 

(A3) the regression function m(-) has continuous second order partial derivatives over X , 

(A 4 ) the i.i.d. centered error regression terms e, e, 's, have finite 6th moments, and are independent of the 
covariates X, X^ 's, 

(A5) the probability density function /(•) has bounded continuous second order derivatives over R and sat- 



(A 6 ) the p.d.f f of (X, Y) has bounded continuous second order partial derivatives over M. d x R, 

(A7) the Kernel function K is symmetric, continuous over R d with support contained in [— l/2,l/2] d and 




(2.6) 



interior, 



isfies, for h p (e) = e p f(e), sup eeR \h [ p (e)| < oo, p G [0,2], k G [0,2], 



fK (z)dz = l, 
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(A-a) the Kernel function K\ has a compact support, is three times continuously differentiable overR, and 
satisfies J Ki(v)dv = 1 and J vK\(v)dv — 0, 

(A 9 ) the bandwidth & decreases to and satisfies, for d* = sup{d+2, 2d}, nb^ / Inn — > oo and ln(l/& )/ ln(ln n) — > 
oo when n — > oo, 

(Aio) the bandwidth b\ decreases to and satisfies rj( d+8 )&^ d+4 ' ) — > oo when n — » oo. 

Assumptions (^2), (^3), (^5) and (Aq) impose that all the functions to be estimated nonparametrically have 
two bounded derivatives. Consequently the conditions J zKo(z)dz — and JvKi(v)dv — 0, as assumed in 
(A7) and {Ag), represent standard conditions ensuring that the bias of the resulting nonparametric estimators 
(I2.2p and (|2.6p arc of order b\ and b\. Assumption (A4) states independence between the regression error 
terms and the covariates, which is the main condition for (|2.1I) to hold. The differentiability of Ki imposed in 
(As) is more specific to our two-steps estimation method. Assumption (Ag) is used to expand the two-steps 
Kernel estimator f\ n in (|2.5p around the unfeasible one f\ n from (|2.6p . using the residual error estimation 
Si — Ei's and the derivatives of K\ up to third order. Assumption (Ag) is useful for obtaining the uniform 
convergence of the regression estimator fh n defined in (|2.2p (see for instance Einmahl and Mason, 2005), 
and also gives a similar consistency result for the leave-one-out estimator rhi n in (I2.4p . Assumption (Aio) is 
needed in the study of the difference between the feasible estimator /i„ and the unfeasible estimator f\ n . 

4 Main results 

This section is devoted to our main results. The first result we give here concerns the pointwise consistency 
of the nonparamatric Kernel estimator f\ n of the density /. Next, the optimal first-step and second-step 
bandwidths used to estimated / are proposed. We finish this section by establishing an asymptotic normality 
for the estimator f\ n . 

4.1 Pointwise weak consistency 

The next result gives the order of the difference between the feasible estimator and the theoretical density 
of the regression error at a fixed point e. 

Theorem 4.1. Under (A\) — (^5) and (A7) — (Aiq), we have, when bo and b± go to 0, 

f ln (e) - /(e) = oJAMSE^+R^boM) 




G 



where 



AMSE(bi) = E„ 



(/i„(e)-/(e) 



P b\ 



nbi J 



i?„(&oA) = 6q 



d\ 1/2' 



1 



d\ 1/2' 



The result of Theorem 14.11 is based on the evaluation of the difference between /i n (e) and /i n (e). This 
evaluation gives an indication about the impact of the estimation of the residuals on the nonparametric 
estimation of the regression error density. 

4.2 Optimal first-step and second-step bandwidths for the pointwise weak con- 
sistency 

As shown in the next result, Theorem 14.21 gives some guidelines for the choice of the optimal bandwidth bo 
used in the nonparametric regression errors estimation. As far as we know, the choice of an optimal 60 has 
not been addressed before. In what follows, a n x b n means that a n — 0(b n ) and b n — 0(a n ), i.e. that there 
is a constant C > such that \a n \/C < \b n \ < C\a n \ for n large enough. 

Theorem 4.2. Suppose that (Ai) — (A5) and (A?) — (Aiq) are satisfied, and define 

b'o = K(bi) = argmini?„(6 , bi). 

bo 

where the minimization is performed over bandwidth bo fulfilling {Ag). Then the bandwidth 6g satisfies 

b ^ m&X {{Jbj) d+4 '{^bj) 2 " 4 }' 

and we have 

{ 4 4 \ 



Our next theorem gives the conditions for which the estimator /i n (e) reaches the optimal rate n~ 2 / 5 
when bo takes the value 6g. We prove that for d < 2, the bandwidth that minimizes the term AM SE(bi) + 
R n (bo,bi) has the same order as n -1 / 5 , yielding the optimal order n -2 / 5 for (AMSE(b\) + R n {bo, bi)) 1 ^ 2 . 
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Theorem 4.3. Assume that (A±) — (A§) and (A?) — (^4io) are satisfied, and set 



b\ = argmin( AMSE{bi) + R n {b* Q M) ) , 



where &q = 6g(6i) is defined as in Theorem \4-£\ Then 
i. For d < 2, the bandwidth b\ satisfies 



and we have 



ii. For d > 3, b\ satisfies 



and we have 



\AMSE(b\) + i?„(6g, 6*) j 2 x (± 



\ \ 2d+l 



1 \ 2d+l 



yAMSE{b\)+R n (bl,bl 



The results of Theorem 14.31 show that the rate n -2 / 5 is reachable if and only when d < 2. These results are 
derived from Theorem 14.21 This latter indicates that if b\ is proportional to n -1 / 5 , the bandwidth b$ has 
the same order as 



I \ 5(d + 4) / 1 \ 5(2d+4) / 1 \ 5(2d+4) 

n) '\nj J \n) 

For d < 2, this order of 6g is smaller than the one of the optimal bandwidth &o* obtained for pointwise or 
mean square estimation of m(-) using a Kernel estimator. In fact, it has been shown in Nadaraya (1989, 
Chapter 4) that the optimal bandwidth &o* for estimating m(-) is obtained by minimizing the order of the 
risk function 

r n (bo) = E J 1 (x G X) (m n (x) — m{x)) 2 g^(x)w(x)dx 

where g n {x) is a nonparametric Kernel estimator of g{x), and w(-) is a nonnegative weight function, which 
is bounded and squared integrable on X. If g(-) and m(-) have continuous second order partial derivatives 
over their supports, Nadaraya (1989, Chapter 4) shows that r n (bo) has the same order as bg + (l/(nb Q 1 )^, 
leading to the optimal bandwidth bo = n _1// ' d+4 - ) for the convergence of the estimator rh n {-) of m(-) in the 
set of the square integrable functions on X. 
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For d=l, the optimal order of 6q is n (V s ) x ( 4 / 3 ) which goes to slightly faster than n 1 / 5 , the optimal order 
of the bandwidth b for the mean square nonparametric estimation of m(-). 

For d = 2, the optimal order of 6q is n -1 / 5 . Again this order goes to faster than the order n -1 / 6 of the 
optimal bandwidth for the nonparametric estimation of the regression function with two covariates. 

However, for d > 3, we note that the order of &q goes to slowly than bo- Hence our results show that 
optimal fh n (-) for estimating /(•) should use a very small bandwidth b . This suggests that m„(-) should be 
less biased and should have a higher variance than the optimal Kernel regression estimator of the estimation 
setup. Such a finding parallels Wang, Cai, Brown and Levine (2008) who show that a similar result hold 
when estimating the conditional variance of a heteroscedastic regression error term. However Wang et al. 
(2008) do not give the order of the optimal bandwidth to be used for estimating the regression function in 
their heteroscedastic setup. These results show that estimators of m(-) with smaller bias should be preferred 
in our framework, compared to the case where the regression function m(-) is the parameter of interest. 

4.3 Asymptotic normality 

We give now an asymptotic normality of the estimator /i n (e). 
Theorem 4.4. Assume that 

(An) : nb^ +4 = 0(1), nb^h = o(l), nb d b\ -> oo, 
when n goes to oo. Then under (Ai) — (As), (Ar) — (A\o), we have 

(fm(e) 7i„(e)) ^ ^ (o, ¥{ ^ Xo) J Kf(v)dv^j , 

where 

7i„(e) = /(e) + |/ (2) (e) J v 2 K^v)dv + o (b\) . 

The result of this theorem shows that the best choice b\ for the bandwidth bi should achieve the minimum 
of the Asymptotic Mean Integrated Square Error 

AMISE = "1 J {^(eifde (/ v^dv)' + / K^dv, 
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leading to the optimal bandwidth 



1/5 



-1/5 



(f {2) (e)) 2 de (J v'K^dv 
We also note that for d < 2, b\ = b\ and b = 6q> Theorems 14.31 and 14.21 give 

6i 

which yields that 

12— 2d 
I \ 5(2d+4) 

n / 

















a) 











rib\ 



d+4 



\ \ 5(2d+4) 



4rf-8 
X "\ 5(2d+4) 



This shows that for d = 1, the (An) is realizable with the optimal bandwidths 6q and b\. But with these 
bandwidths, the last constraint of (An) is not satisfied for d = 2, since nb^bf is bounded when n — > oo. 



5 Conclusion 

The aim of this chapter was to study the nonparametric Kernel estimation of the probability density function 
of the regression error using the estimated residuals. The difference between the feasible estimator which 
uses the estimated residuals and the unfeasible one using the true residuals are studied. An optimal choice of 
the first-step bandwidth used to estimate the residuals is also proposed. Again, an asymptotic normality of 
the feasible Kernel estimator and its rate-optimality are established. One of the contributions of this paper 
is the analysis of the impact of the estimated residuals on the regression errors p.d.f. Kernel estimator. 

In our setup, the strategy was to use an approach based on a two-steps procedure which, in a first 
step, replaces the unobserved residuals terms by some nonparametric estimators £i. In a second step, the 
"pseudo-observations" are used to estimate the p.d.f /(•), as if they were the true e,'s. If proceeding so 
can remedy the curse of dimensionality, a challenging issue was to measure the impact of the estimated 
residuals on the final estimator of /(•) in the first nonparametric step, and to find the order of the optimal 
first-step bandwidth bo. For this choice of bo, our results indicates that the optimal bandwidth to be used 
for estimating the regression function m(-) should be smaller than the optimal bandwidth for the mean 
square estimation of m(-). That is to say, the best estimator m„(-) of the regression function m(-) needed 
for estimating /(•) should have a lower bias and a higher variance than the optimal Kernel regression of the 



10 



estimation setup. With this appropriate choice of bg, it has been seen that for d < 2, the nonparametric 



reached for the Kernel density estimator of real- valued variable. Hence our main conclusion is that for d < 2, 
the estimator /in(e) used for estimating /(e) is not affected by the curse of dimensionality, since there is no 
negative effect coming from the estimation of the residuals on the final estimator of /(e). 

6 Proofs section 

Intermediate Lemmas for Proposition 12.11 and Theorem 14.11 
Lemma 6.1. Define, for x £ X , 



estimator f\ n (e) of / can reach the optimal rate n 2 / 5 , which corresponds to the exact consistency rate 




Then under (Ai) — (A2), (A4), (A7) and (Ag), we have, when bg goes to 




sup \g n (x) - g{x)\ = O (bl) , sup |<? n (a;) - g n (x) \ = P [bf ) + 




and 




Lemma 6.2. Under (Ai) — (A4), (Aj) and (Ag), we have 




Lemma 6.3. Define for (x, y) € R d x R, 




Then under (A\) — (A3), (Ao) — (^9), we have, when n goes to infinity, 




1/2 
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Lemma 6.4. Set, for (x, y) <= M. d x WL, 

<Pin(x,U) 



1 - Ko (^] K J Yi -y 



Then, under (A e ) — (As), we have, for x in X an d y in R, h and hi going to 0, and for some constant 
C>0, 

hld 2 ip(x,y) f T h\d 2 ip(x,y) 



E[ip in (x,y)] -ip(x,y) 



Var [ip in (x, y)} = 



E 



Win (x,y) -Etp in (x,y)\ c 



■J Z K ( z)z - dz + ^ d -^l Jv 2 Ki(v)dv 



< 



2 d 2 x 
+ o(hl + h\), 



i pQ e x ) v i - . 



Lemma 6.5. 5et 

f- (A = 

biP(X e X )" L \ h 

Then under (A4), (A5) and (As), we have, for bi going to 0, and for some constant C > 0, 

\,2 



E/ in (e) = f(e)+ h -±fW(e) J v 2 Ki(v)dv + o(b\), 

V)l Kl{v)dv+ °{v)^ 



Var(/ m (e)) 



E|/ m (e)-E/ m (e)| 3 < 



hF(X e X ( 

cm 

b 2 P 2 (x e x Q ) 



J \KM\ 



£i - e 



Lemma 6.6. Define 

n , 

S n = Y, 1 ( X ^ X o)(rh ln -m(X l ))K[ 1) ( 
i=i ^ 



R n = Y^t(X l eX )(m m -m(X l )f (l-t) 2 K\ 3 

Jo 



i=l 



2 ^(3) / - f(min - m(Xj)) - e 



61 



12 



Then under (A\) — (A 5 ) and (A-j) — (A w ), we have, for b and bi small enough, 

bl (n6? + (n6 1 ) 1 / 2 ) + (^ + |) 1/21 





= o 


T 


= Oi 


Rn 


= o 



nbfij 



/ i \ 3/2' 

(nbl+(nXh) 1/2 ) U + ^ 



Lemma 6.7. Under (A 5 ) and (A$) we have, for some constant C > 0, and for any e inM. and p e [0, 2], 



K\ l) [^\ e p f(e)de 



I* 



bi 



e — e 
~bV 



e p f(e)de 



e p f(e)de 



<Cb u 
<Cb u 
<Ch, 



bi 

<■>, I e-e 



J* 



e — e 
~bV 



e p f(e)de 



°f(e)de 



e p f(e)de 



<Cbf, 
<Cbl 
< Cb\. 



(6.1) 
(6.2) 
(6.3) 



Lemma 6.8. Set 



Pin = 1 {X ! d t X ° ] £ (m{X )- m { Xl ))KJ 

n °09in . 1 . , . V 



Xj — Xi 



b 



Then, under (A±) — (A5) and (Aj) — (A10), we have, when bo and bi go to 0, 

£[} in KV ( £ -ip) = P (bl) (nb\ + (nbi)^) . 



1=1 



Lemma 6.9. Set 



Then, under (Ai) — (A 5 ) and (A 7 ) — (A w ), we have 



bo 



(l) / £i 



i=l 



bi 



1/2 
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Lemma 6.10. Le£E„[-] be the conditional mean given X\, X n . Then under (Ai) — (A^) and (A-?) — (Ag) , 
we have, for &o going to 0, 



sup E r , 

Ki<n 



sup E r , 

Ki<n 



1 (Xi e Xq) (m m - m{X t )Y 
1 (A, e Xq) (m m - ? n(A l )) 6 



= Op ( b A Q 



= Op ft 



1 



Lemma 6.11. Assume that (A4) and (-A7) ZioZd. Then, for any I < i j < n, and for any e in M 7 

(frWn - m(Xi),£j) and (mj„ - m(Xj),ej) 
are independent given Xi, . . . ,X n , provided that \\Xi — Xj\\ > Cbo, for some constant C > 0. 

Lemma 6.12. Let Var„(-) and Cov n {-) be respectively the conditional variance and the conditional covariance 
given X\, . . . , X n , and set 

tin =t(Xi€ Xq) (m ln - m(Xi)) 2 K[ 2) (^-^ 
Then under (Ai) — (-A5) and {A7) — (Ag), we have, for n going to infinity, 



n / 1 \ 2 

53 Var « (Cin) = Or (nh) ( bl + — d J , 



i=l 



n n / 1 \ * 

53 ^ Cov„ (C m , Cin) = P (n 2 & d 6p) + \ 
i—l 3=1 ^ ' 



All these lemmas are proved in Appendix A. 



Proof of Proposition 12.11 

Define f n (e\x) as in Lemma 16.31 and note that by this lemma, we have 



fn(e\x) = f n (e\x) + 1 



1 



1/2 



(6.4) 



The asymptotic distribution of the first term in (|6.4[) is derived by applying the Lyapounov Central Limit 
Theorem for triangular arrays (see e.g Billingsley 1968, Theorem 7.3). Define for x € Xq and y € R, 



Yj-y 
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and observe that 



f / i \ (x,m{x) +e) 

MX) = g^Jx) ■ (6 - 5) 

Let now (fi n (x, y) be as in Lemma |6.4[ and note that 

<p n (x,y) = -^^2[<Pin(x,y) -E[fr n (x,y)]\ + E[(p ln (x,y)] . (6.6) 

The second and third inequalities in Lemma T6.4I give, since h^hi goes to 0, 

M^Eg^ < W//l^-Wl'^^(^) _ ^ _ o(1) 
E^Va, [«„<„)]) ^JJw.rt^ + ofa)) 

Hence the Lyapounov Central Limit Theorem gives, since nh^hi diverges under (Ao), 

E"=l {&n{x,y) -E[fr n (x,y)]} d 



(E"=i Var[^ m (a;,y)]) 

so that 



1/2 



AA(0,1), 



X -Y^($in{x,y) -E[& n {x,y)]} A AT (o,<p(x,y) J J K 2 Q {z)Kl{v)dzd^j . (6.7) 



Further, a similar proof as the one of Lemma 16 . 1 1 gives 



1 1 ( , a lnn^ 



Op + . (6.8) 



9n(x) g(x) V «^o, 

Hence by this equality, it follows that, taking y = m(x) + e in (I6.7[) . and by (|6.4j) - (l6.6[) . 

V / ^i'(7„(e|^)-7„(e|^)) 4a^0,^^ J J Kl(z)Kl(v)dzdv 

where 

7 / M _ gjgln (x,m(a;) + e)] 

This yields the result of Proposition 12. 11 since the first equality of Lemma WM and (|6.8[) yield, for /iq and hi 
small enough, 

/n( e F) = ^( e W + <W^) 0^ 7 zif (z)z dz 

/l? d 2 Lp(x, mix) + e) { n Tr , x , 1,0 , 2 \ „ 

' iTld(i;)du + (/Iq + /if) .□ 
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Proof of Theorem 14.11 

The proof of the theorem is based upon the following equalities: 



/in(e) - fin(e) 



Op 



68- 



1 1 



Op 



1 



b d \ 1/2 ' 



1/2 



o P 



(n6f)!/2 

3/2 



(6.9) 



and 



hn(e)-f(e) = P (bj+ — 



1/2 



(6.10) 



Indeed, since 



fm(e) - /(e) = (/ lw (e) - /(e)) + / ln (e) - / ln (e), it then follows by §30) and ^ that 

1/2 



/m(e) - /(e) 



Of 



1 1 



n6i 



n n 



2 b^b\ \{nb\y/ 2 \b\) 



MV /2N 



Op 



fii \b\) 



1/2 



This yields the result of the Theorem, since under (Ag) and (^4io), we have 



1 



= o 



1 



1 



= o(M)( ( v 



1 



,n&i/' n 2 b%bl V&i, 

Hence, it remains to prove (|6.9|) and ()6.10p . For this, define S n) R n and T n as in Lemma 16.61 Since 
— £j — — (^in — m (-^"i)) an( i that i^! is three times continuously differentiable under (A 8 ), the third-order 
Taylor expansion with integral remainder gives 



/i™(e) - fin(e) 



1 



^l(X 4 e^ ) 



A'i 



£z - e 



^1 



£i - e 
61 



$n T n R n 



Therefore, since 



^ 1 (x 4 e * ) = n (P (X e Ab) + op(i)) , 
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by the Law of large numbers, Lemma 16.61 then gives 



fin(e) - /in(e) = Of 



1 

nb\ 



6 1 



1 



S n + Of 



1 



nb\ 
1 



T n + Of 

1/21 



nb\ 



Of 



(nbf) 1 / 2 J \n n 2 b<*b\ 
1 



(nb\y/ 2 \b\ 



b d^ 1/2' 



nb% 



Of 



1 



&d N 1/2' 

61 



3/2 



<7 



This yields f|6 .9(1 . since under (Ag) and (Aio), we have &o - * 0, rt&Q +2 — > oo and n&f — > oo, so that 



6 1 



(rife?) 1 / 2 



mV /2 



(n&f) 1 / 2 \b 3 J 
For (jfTTUl) . note that 



nb$ 



(nbl) 1 / 2 ' \b 3 J 



1/2' 



(/m(e)-/(e) 



Var, 



(/m(e)) 



fin(e) -/(e) , 



with, using (A4), 

Var„ (fin(e) 



Therefore, since the Cauchy-Schwarz inequality gives 



^ 1 (Xi G Xq ) Var 



e — e 
~6T 



Var 



£ — £ 



< 



e — e 



< b x \ K 2 (v)f(e + b lV )dv, 



this bound and the equality above yield, under (A$) and {Ag), 

C 



Var n (/ ln ( £ ))<^ 1(l!e , o) 
For the second term in (|6.1ip . we have 



Of 



E„ 



fin(e) 



1 



^i(v,e ^ )E 



1 

nb\ 



e — e 



(6.11) 



(6.12) 



(6.13) 



By (As), i^i is symmetric, has a compact support, with JvKi(v) = and jKi(v)dv = 1. Therefore, since 
under (A5) / has bounded continuous second order derivatives, this yields for some 9 = 9(e,biv), 

' e — e N 



IE 



bi 

b 3 

bif(e) + £ 



= 61 / iTi(u)/(e + 6iu)dw 



f(e) + b lV f^(e) + ^ff^(< ! W V ] 
u 2 ifi(u)/( 2 )(e + 0&it;)<fo- 
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Hence this equality and (I6.13|) give 

E 

so that 



/in(e)] =f(t) + i I v 2 K 1 (v)f^{e + 6b 1 v)dv, 



fm(e) -/(e) = O p (bf) 



Combining this result with (|6.12p and ()6.11[) . we obtain, by the Tchebychev inequality, 



fm(e)-f(e) = O r (bf + 



nbi 



1/2 



This proves (16.10[) . and then achieves the proof of the theorem. 



Proof of Theorem 14.21 

Recall that 



R n (bo,bi) = 
and note that 



1 



(n6f)V2 \bl 



d\ 1/2- 



d\ 1/2' 



1 \ d+ 



1 \ d + 



n?b\ 



oo. To find the order of 6^, we shall deal with the cases nb 



d+4 



and 



n 2 b\J I \n 2 b\ 

if and only if n 4 ~ d b d+16 
nb d+i = 0(1). 

First assume that More precisely, we suppose that bo is in [(u n / n) 1 ^ d+4 \ +oo), where u n — >• oo 

Since l/(n6g) = 0(&q) f° r au these we have 



'° + nbi 



6^ 



1 



o 



u 0/ V "^0 

Hence the order of &q is computed by minimizing the function 



bo -> 6q 



1 



1 



hi W 



V2 



Since this function is increasing with bo, the minimum of R n (-,bi) is achieved for bo* = (u n / n) 1 1 ( rf + 4 ). We 
shall prove later on that this choice of bo* is irrelevant compared to the one arising when nb d+i = 0(1). 
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Consider now the case nb d+i = 0(1) i.e 6 4 = O (\/{nb d )). This gives 



{nb\y/z \b\ 



d\ 1/2- 



+ 7^ 



d\ 1/2- 



1 /) d 
_^ |_ ^0 

nb\ b\ 

b\ + b\ 



n 2 b 2d 



Moreover if nb d b\ — > oo, we have, since n&o d — > oo under (Ag), 



id 



nb\ b\) Wbf) b\ \n 2 b 2d 



yd 
°0 



1 + M 

6? &T 



n^bf 



o( b 4 



bl 



2 b 2d 



Hence the order of 6q is obtained by finding the minimum of the function bg + (l/ n 2 &Q& 3 ) . The minimization 
of this function gives a solution bg such that 



bo 



n 2 b\ 



This value satisfies the constraints n&Q +4 = O(l) and n&gfe 4 -> oo when n 4 d b d+16 -> oo. 
If now n&^ +4 = 0(1) but n6$& 4 = 0(1), we have, since rifc^ — > oo, 



1 

nb\ 



n 2 b 2d 



-1 



n 3 ^ 



ft? V^ 3 &o d 



o 



n 2 6 2<i 



O 



b\ 



n%l d 



In this case, bg is obtained by minimizing the function bg + (l/n 3 6§ d 6[), for which the solution bg verifies 



b °~Wbj) ' i?n(6o ' &l) ' 

This solution fulfills the constraint nb d b\ = 0(1) when n 4 - d ^+ 16 = 0(1). Hence we can conclude that for 
bg = O (l/(n6o)), the bandwidth 6g satisfies 



max 



n 2 6 3 / ' \n 3 bl 



which leads to 



Rn (bg, h) x max ■ 



n 2 b\ 



n 3 b\ 



We need now to compare the solution bg to the candidate 6 * = (wn/n) 1 /^ 4 ) obtained when nb^ +4 — > oo. 
For this, we must do a comparison between the orders of R n (bg,bi) and i?„(&o*, Since R n (bg,bi) > bg, 
we have i?„(&o*,&i) > (un/n) 4 "^ 4 ', so that, for n large enough, 



Rn(b*gM) 

Rn{bo*,h) 



c 



n 2 b\ 



d+4 



+ 



n 3 bl 



d+4 



4(d+8) 

(2,; i i)(d+4) 



7(d + 4) 
d + 8 



= o(l), 
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using u n — > oo and that ri/ d+8 '&^ <i+4 ' 1 — > oo by (^4io)- This shows that i?„(6g,6i) < i?„(&o*,&i) for n large 
enough. Hence the Theorem is proved, since 6*, is the best candidate for the minimization of &i). □ 

Proof of Theorem 14.31 

Recall that Theorem 14.21 gives 

AMSE(h) + R n {blM) ~ nib,) + r 2 (6i) + r 3 (6i) = F(h), 

where 

ri (h) = h * + ^ argminri(ft) x n~ 1/5 = ft*, minr^/i) x (ft*) 4 = n" 4 / 5 , 

1 2 g 

r2(/i) = h -\ 1 — , argminr2(/i) x = ft,*,, minr 3 (ft) x = n -7 ^ 7 , 

r 3 (ft) = ft 4 H g— — 55—, argminr 3 (ft) x n~2dTTT = ft* ; minr 3 (ft) x (ft 3 ) 4 = n~ ^rrr. 

Each rj(ft) decreases on [0, argminrj (ft)] and increases on (argminrj(ft), oo) and that rj(ft) x ft 4 on 
(argminrj(ft),oo). Moreover minr2(ft) = o(rs(/i)) and ft*, = o(ft 3 ) for all possible dimension d, so that 
min{r 2 (ft) +r 3 (ft)} x (ft 3 ) 4 = t^wtt anc j argmin{r 2 (ft) +r 3 (ft)} x ft 3 = ^"^ttt. 

Observe now that min{r2(ft) + r 3 (ft)} = O (minri(ft)) is equivalent to n~ 2d + 11 = O (n~ 4 / 5 ) which holds 
if and only if d < 2. Hence assume that d < 2. Since n~ 2d + 11 = O (n -4 / 5 ) also gives argmin{r2(ft)+r 3 (ft)} x 
ft 3 = O (ft.*), we have 

mmF(bi) x n~ 4 / 5 and argminF(6i) x n^ 1 ^ 5 . 
The case d > 2 is symmetric with 

minF(&i) x n~™+ u and arg minimi) x rjT 2 ^ 11 . 

This ends the proof of the Theorem. □ 

Proof of Theorem 14.41 

Observe that the Tchebychev inequality gives 

n 

^2 1 PQ G #0) =nP(Ie Af ) 

i=l 

so that 

7m (c) = 

20 







1 + O, 













1 + Op 









/«(*), 



where 



Therefore 



fn(e) 



nbiF(X e Xo) ^ 

4 — 1 



£i - e 



/m(e) - E/„(e) = (/„(e) - E/„(e)) + (/ ln (e) - / ln (e)) + Or f n (e). (6.14) 

Let now /j n (e) be as in Lemma [6.51 and note that / ra (e) = (l/ n ) S"=i /in(e)- The second and the third 
claims in Lemma T6 . 5 1 yield, since &i goes to under (Aiq), 



J2ti^\f m (e)-Kf ln (e)\ 6 < nxex ) 



(Er=iVar/ in (e))^ 



0(h) = o(l). 



Hence the Lyapounov Central Limit Theorem gives, since nb± diverges under (Aiq), 

/„(e)-E/„(e) /»(e)-E/ n (e) d 



v /Var/ n (e) / Var/ <n («) 

V n 

which yields, using the second equality in Lemma 16.51 

/(e) 



Vnbi (/ n (e)-E/„(e)) A7V(0 
Moreover, note that for n&Q&f — > oo and ntff — > oo, 



'(X G #0) 



1 / 1 



1 bt\ 2 ( 1 N 1 



•<V(0,1), 



K((v)dv . 



1 



(6.15) 



Therefore, since by Assumptions (An) and (Ag), we have b$ = (l/(n&g)), nfrgfrf — > oo and that nb^ d — » oo, 
the equality above and (|6.9p then give 

1/2 



/in(e) - /m(e) 



Of 



1 



b t + Z 



1 



1 



1 



1 , 6g 



6 2 b\) \nb$ 



1 



1/2 



Hence for b\ going to 0, we have 

Vnbl (?in(e) - 7i„(e) 



O r 



nb i [ b a + ~ + 



1/2 



op(l), 



since nfoQ&i = o(l) and that n&o& 2 — > oo under Assumption (An). Combining the above result with (|6.15p 
and (|6.14p . we obtain 



'nb\ 



(/i„(e)-E/„(e)) 4jVfo, 



Kl{v)dv 
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This ends the proof the Theorem, since the first result of Lemma 16.51 gives 



E/ n (e) = E/ ln (e) = /(e) + |/ (2) (e) / v^^dv + o (b\) := f ln (e).n 



Appendix A: Proof of the intermediate results 
Proof of Lemma 16.11 

First note that by (A?), we have J zKo(z)dz — and J K$(z)dz = 1. Therefore, since Kq is continuous and 
has a compact support, (A±), (A2) and a second-order Taylor expansion, yield, for 60 small enough and any 
x in Xq, 



K {z) 



b g( 1 \x)z + ^zg^(x + db z)z T 



dz 



K (z) [g(x + b a z) - g(x)} dz 
= 0{x,b o z) G [0,1] 



b Q g {1) {x) I zK (z)dz+ — \ zg {2 \x + 6b z)z T K (z)di 



zg {2) (x + 9b Q z)z T K (z)dz 



<Obl 



so that 



sup \g n (x)-g{x)\ = O (p 2 ) . 

x£X 

This gives the first equality of the lemma. To prove the two last equalities in the Lemma, note that it is 
sufficient to show that 



sup \g n {x) - g n (x)\ = O p ( ^ 



1/2 



since g n (x) is asymptotically bounded away from over Xq and that \g n (x) — g{x)\ — 0(6q) uniformly for x 
in Xq. This follows from Theorem 1 in Einmahl and Mason (2005). □ 



Proof of Lemma 16.21 

For the first equality in the lemma, set 

1 

and observe that 



X \ X ) , f n (x)=M[r n (x)} 



sup \m n (x) — m(x)\ < sup 

XQXq x£X 



m n (x) 



r n {x) 



9n( x ) 



sup 

xex \g n {x)\ 



\r n (x) -g n (x)m(x)\ 



(A.l) 
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Consider the first term of (TQ1) . Note that E 1 / 4 [Y 4 \X = x] < \m(x)\ + E 1 / 4 [e 4 ] . The compactness of 
X from (Ax), the continuity of m(-) from (A3) and (A4) then give that E [Y" 4 |X = a;] < 00 uniformly for 
x G X . Hence under (Ag), Theorem 2 in Einmahl and Mason (2005) gives 

1/2 



sup 

x£X 



m„(a;) - — 



9n( x ) 





/lnn\ 


= o P 







For the second term in (|A.1[) . a second-order Taylor expansion gives, as in the proof of Lemma 16.1 



sup \r n (x) - g n {x)m(x)\ = O{b 2 ). 



This gives the result of lemma since Lemma 16.11 implies that g n (x) is bounded away from over Xq uniformly 
in x and for 60 small enough. □ 



Proof of Lemma 16.31 

Note that under (As), the Taylor expansion with integral remainder gives, for any x G Xq and any integer 
i G [1, n], 

' Yj - m n (x) - e \ ( Yj- m(x) - e \ 1 f 1 {1) ( Yj- n (x,t) \ 
K l{ j=i^ j-- {mnix) -m(x))J o K\ { )dt, 



where 6 n (x, t) = m(x) + e + t (fh n (x) — m(x)). Therefore 



fn(e\x) = f n (e\x) 



fh n (x) — m(x) 

9n(x) 



1 ™ 



Xi-x\ f 1 K p nri-o n (x,t) 



l o J Jo 



dt 



Now, observe that if Xi = z and t/£R, the change of variable e = y — m(z) + h\v gives, under (Ai) 
and (A7), 



(A.2) 
-(A 5 ) 



K 



(i) ( Yj-y 
hi 



= E 



(1) / gi + m(z) - y 



(1) / e + m{z) - y 



hi 



hi 

f(e)de 



hi / \K[ 1 \v)\f((y-m(z) + h 1 v))dv<Ch 1 . 



Hence 



sup 



/ E„ 




/o 





, (1) ( Y l - Q n {x,t) 
hi 



dt < Chi. 
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With the help of this result and Lemma 16.11 we have 

E„ 



1 ^ K f X i -x \ f 1 (1) ( Yi-9 n {x,t) 
Xi - x 



dt 



< 1 y 

" nh d h x jr[ 



K 



1 1 



i=i 



Kn 



h 

Xj - x 
h 



x sup 

l<i<n Jo 



Op(1), 



[ (1) f Yj-e n (x,t) 
Kl { hi 



dt 



so that 



1 ^(i) ( Yj-e n { x ,t) 
,» 1 I Si 



Hence from (|A.2[) . (|6.8[) . Lemma [6~T21 and Assumption (Aq), we deduce 



./>k> I-''' 1 " (£) ( 6 o + 5 



Proof of Lemma 16.41 and Lemma 16.51 



1/2 



f n (e\x) + o 



ih^hi 



1/2 



.□ 



We just give the proof of Lemma 16.41 the proof of Lemma 16.51 being very similar. For the first equality of 
Lemma EH note that 



1 



Ko [ — - ) Ki [ — t — - ) Lp(x 1 ,y 1 )dx 1 dy 1 



h^hi J J u \ ho J \ hi 

Ko(z)Ki(v)ip (x + hoz, y + hiv) dzdv 



E[ip in (x,y)} 



A second-order Taylor expansion gives under (A$), for z in the support of Ko, v in the support of Ki, and 
ho, hi small enough, 



<p (x + h z, y + hiv) - ip(x, y) 

dip(x,y) T dtp(x,y) 

= h — z +fti — v 

ox ay 

h\ d 2 ip(x + 9h z, y + 6h\v) T 



d 2 x 



z + h\hoV 



d 2 tp(x + 9h z, y + 0hiv) 
dxdy 



h 2 d ip(x + 9h z,y + 8hiv) 2 
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for some 9 = 9(x,y,hoz,hiv) in [0,1]. This gives, since jKo(z)dz = jKi(v)dv = 1, JzKo(z)dz and 
JvKi(v)dv vanish under (Ay) — (Ag), and by the Lebesgue Dominated Convergence Theorem, 



E[ip in (x,y)] - <p(x,y) - 



K d 2 cp(x,y) 
2 d 2 x 



zKq(z)z t dz - 



h\ d 2 ip(x,y) 
2 d 2 y 



v 2 Ki(v)dv 



"o 
2 

+ 



d 2 (p(x + 9h z,y + 9hiv) d 2 ip(x,y)\ T 

$r x Wx~) z K ^ z ) K ^ v ) dzdv 



h h f f ( d 2 ^(x + 9h z,y + eh 1 v) d 2 <p{x,y) \ T 
hlh °J J V { dxTy dx-dy-) Z K o(^(v)dzdv 

v 2 Kq (z)Ki (v)dzdv 



hi 



d 2 cp(x + 9h z, y + 9hiv) d 2 ip(x, y) \ , 



d 2 y 



d 2 y 



= o(h 2 + h 2 ). 

This proves the first equality of Lemma 16.41 The second equality in Lemma follows similarly, since 

Vax[<p in (x,y)] = E [(p 2 n (x,y)] - (E[ip m (x,y)}) 2 

' / Lp(x + h z,y + h 1 v)KQ(z)Kf(v)dzdv + O(l) 

K$(z)K'i(v)dzdv + o ' ' 



The last statement of Lemma 16.41 is immediate, since the Triangular and Convex inequalities give 



E \ipi n (x,y) - E(p in (x,y)\ 3 < CE \tp in (x, y)\ 3 

Ctp(x,y) 



< 



K d hl 



iKoizjK^v)] 3 dzdv + o ( ' 



\h 2d h 2 



Proof of Lemma 16.61 



The order of S n follows from Lemma 16.81 and Lemma 16.91 In fact, since 

l{Xi G Xq) 



l(Xi G X ) (fhin - m(Xi)) = 



m(X z ))K '' X > Y ' 



&0 



Lemma 16.81 and Lemma 16.91 give 

S n = Of 
which gives the result for S n . 

For T„, define for any 1 < i < n, 



h 2 (nbl + inh) 1 ' 2 



nb i + 7d 



1/2 



E in [-] = E„ [Xx, . . .,X n ,e k , kyii]. 
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Therefore, since (mj„ — m(Xi)) depends only upon (X±, . . . , X n , Ek, k 7^ i), we have 



E„ [T n ] 



E, 



E, 



1 (-Y; : .V„)(»i;„ in( X;,)) 2 K[ 2) ( ^— ^ 



53 1 (Xi e #0) (m in - m(Xi)) 2 E 



(2) / ^ 



e, - e 



with, using (A4) and Lemma RT71 - (|6.2p . 



(2) 



Hence this bound, the equality above, the Cauchy-Schwarz inequality and Lemma 16.101 yield that 



i=l 



1 {X, e Xo) (m m - miX,)) 2 



< Cnbi ( sup E„ 

J<i<n 



1 e *o) (m,„ - m(X 2 )) 4 



1/2 



< O v {nbl)(bt ) + — d 



(A.3) 



For the conditional variance of T n , Lemma T6.12I gives 

n n n 

Var„(T„) = 53 Var " + ^2 ^2 Cov " (Cm, On) 



i=l 



i=l 3=1 



Therefore, since 61 goes to under (A\o), this order and (|A.3[) yield, applying the Tchebychev inequality, 



= O v 



K) (5; + _L) + („ M ^ („j + _|) + v ^ + _^ 

( n i,f + (.,i 1 )" 2 + ( n ^) 1/2 ) (»J + ^3 



which gives the result for T n . 

We now compute the order of R n . For this, define 



A" j 



1 {X t G # ) (m m - m(X,)r 



and note that R n = 5Z i=1 -Rin- The order of i? n is derived by computing its conditional mean and its 
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conditional variance. For the conditional mean, observe that 



E„ [R ri 



E, 



= E n 

with, using (A4) and Lemma W771 (|6. 31) . 



^ Ei„ [iZj, 

,i=i 

n 

1 (*i e # ) (nWn - m(Xi)) 3 E in [4 



|Ein[/in]| = 



eft 



< Cb\. 



Therefore the Holder inequality and Lemma 16.101 yield 

n 

|E„K]| < C6?^E„ [|l(X 4 e X )(m in -m(X i ))\ S 

i=l 
n 

< Cb\ K /A [l {Xi e Xo) (m in - m(Xi)) 4 

i=l 

< o P K)^ + _L)' 

For the conditional covariance of R n , note that Lemma 16.111 allows to write 

n n n / \ 

Var„ {R n ) = £ Var„ {R m ) + £ ^ ( \\X t - Xj\\ < Cb ) Cov„ (R ln , R jn ) , 



3/2 



and consider the first term in (|A.5[) . We have 



11 (X, e Xo) (m m - m(X,)) 6 E m [/?„] 



Var„ (R in ) < E„ < E„ 

with, using (A4), the Cauchy-Schwarz inequality and Lemma 15771 - (16.31) . 



< C 



< C61, 



1 g (3) ^ fj ~ ~ m(Xj)) - 6 ^ (// 



g p)/ e-t(^-m(Jri))- 6 ) /(r , /( 



dt 



so that 



Var„ (i? m ) < Cb{& n [t (X, e X ) (m m - m(X,)) e 



(A.4) 



(A.5) 
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Therefore form Lemma 16.101 we deduce 

n 

J2 Var„ (Ri n ) < Cnbi sup E n [t (X t G X ) (m in - m(Xi)) 6 ] 



< ¥ {nb 1 )[bt + -^J . (A.6) 

For the second term in (IA.5I) . the Cauchy-Schwarz inequality gives, with the help of the above result for 
Var„ (Ri n ), 

|Cov„ (R in , R jn )\ < (Var„ (R in ) Var„ (R jn )) 1/2 

< Ch sup E„ [1 {Xt G Xo) (fh m - miX,)) 6 ] . 

Ki<n 



Hence by Lemma 16.101 and the Markov inequality, we have 

n n / \ 



i = l 3=1 X ' 

3 



This order, (TOt and (TO|) give, since nb^ diverges under (Ag), 



1 



1 



Var (R n ) = O p U4 + (n^h) • 



(n&? + (nX&i) 1 



Finally, with the help of this result, (|A.4[) and the Tchebychev inequality, we arrive at 

3/2 , ,„ / 1 \ 3 / 2 

= Op 

= o P 

Proof of Lemma I6.7I 

Set h p (e) = e p f(e), p G [0, 2]. For the first inequality of (|6.1[) . note that under (A 5 ) and (Ag), the change of 
variable e = e + &i« give, for any integer I G [1, 3], 

6i / Xf ) (v) 2 /ip(e + 6iv)dw 



< 6 lB up|/ip(t)| / ^(v) 2 ^ 

< C6i, 



(A.7) 
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which yields the first inequality in (|6.ip . For the second inequality in (|6.ip . observe that /(•) has a bounded 
continuous derivative under (A 5 ), and that jK^\v)dv = under (Ag). Therefore, since h, p {-) has bounded 
second order derivatives under (A7), the Taylor inequality yields that 



K 



e — e 



e p f(e)de 



bi 



K['\v) [h p {e + b 1 v)-h p {e)) 



dv 



< bl sup \h^{t)\ / \vKf\v)\dv < Cb\. 



which completes the proof of 

The first inequalities of (|6.2j) and (|6.3p follow directly from (| A.7|) . The second bounds in (|6.2p and 
are proved simultaneously. For this, note that for any integer £ € {2,3}, 



h p {e)de = b\ I K± (v)h p (e + b%v)dv 



Under (Ag), K\{-) is symmetric, has a compact support and two continuous derivatives, with J K^p (v)dv = 
and / vK[ t] (v)dv — for t € {2, 3}. Hence, since by (A5) h p has bounded continuous second order derivatives, 
this gives for some 9 = 9(e, b\v), 



K 



e — e 



h / Kf ) {v)[h p {e + b l v)-h p {e)]dv 
2 



h p (e)de 

bivh p l \e) + ^hf\e + eb 1 v) 



dv 



| / v 2 K ( f\v)h p 2 \e + Qb 1 v)d 



< ^-sup\h p 2) (t)\ / v 2 K[ e) (v) dv<Cb\.U 



2 t€ 

Proof of Lemma 16.81 

Assumption (A4) and Lemma \G. 71 (|6.ip give 

_i=l v 



E,, 



Var r , 



E 



K 



(i) ( e-e 
h 



i=l 



< 



i = l 



(i) f g-e 
61 



< Cn^ max \0 in \ , 

Ki<n 



< Cnfoi max 

Ki<n 



Hence the (conditional) Markov inequality gives 



(i) ( e j-e 



= a 



p (ra&i + (nfei) 1/2 ) max |/3 in | 

\ / l<2<n 
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so that the lemma follows if we can prove that 



sup \0 in \ = P (bl) , 

Ki<n 



(A.8) 



as established now. For this, define 



(Y — \ 1 " 

~ J b Q ~)' Vin{x) = (n TW ? (&(*)- 



and ^„(x) = E[Cj(x)]/&Q, so that 



n - 1 + z/„pQ) 



n ft™ 

For maxi<i<„ |P„(Xj)|, first observe that a second-order Taylor expansion applied successively to g(-) and 
m(-) give, for b n small enough, and for any x, z in <Y, 



[m(x + b^z) — m(x)} g(x + bgz) 



b m ( - 1 \x)z + -^zm ( - 2 \x + db z)z T 



g(x) + b g W (x)z + -jzg^(x + 



for some (i = (i(x,b z) and ( 2 = (2(2, b z) in [0, 1]. Therefore, since / zK(z)dz = under (A-j), it follows 
that, by (Ax), (A 2 ) and (A 3 ), 



max |^„(Xi)| < sup = sup 

xEX ocEXo 

< Cb 2 . 



J (m(x + b z) — m(x)) K (z)g(x + b z)dz 



(A.9) 



Consider now the term maxi< t <„ |z/ in (Xj)|. The Bernstein inequality (see e.g. Serfiing (2002)) and (A 4 ) 
give, for any t > 0, 

)n n 
< > F(\v in (Xi)\ >t) < > / P(K(z)| >t\Xi =x)g(x)dx 
i=l i=l J 



< 2n exp 



(n-l)t 2 



2sup^ Var(0(x)/& d ) + §£fi 



where M is such that sup xGXg \(j(x)\ < M. The definition of X given in (A 2 ), (A3), (A 7 ) and the standard 
Taylor expansion yield, for bo small enough, 

/ (m(x + b z) — m(x)) K$(z)g(x + bgz)dz < — 

so that, for any t > 0, 

max |i^„(Xi)| > t I < 2nexp 



1 /" C6 2 

sup \(j(x)\ < Cb , sup Var(Cj(x)/&g) < -j sup / (m(x + 6 z) - m(a;)) 2 K 2 (z)g(x + b z)dz < , 
xex xex b Q x <ex J o 

(n-i)b d tybr 



Ki<n 



C + Ct/b 



30 



This gives 



1/2 



/ 



t < 2nexp 



t 2 lni 



provided that i is large enough and under (Ag). It then follows that 

max \ui n (Xi)\ = O r — - r 

l<i<n \ nbn 



1/2 



This bound, (|A.9[) and Lemma I6TT1 show that (IA.8[) is proved, since 6q l n n /( n f ) o) — O (6q) under (A9), and 
that 

fl _ n-1 Vin{Xi) + v n (Xj) 

Pin — ^ ■'— ' 

n g in 



Proof of Lemma 16.91 

Note that (A4) gives that Ej„ is independent of Si, and that E n [Ej„] = 0. This yields 



(i) ^ Ej - e 

61 



0. 



Moreover, observe that 



Var„ 



.i=l 

n 



(l) / Ei 
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ii «n- fi 1 



&1 



Z Z Cov " 

i=l 3=1 



bi 



^f 1 ^),^f^"' 



(A.10) 



(A.ll) 



For the sum of variances in (|A.11[) . Lemma [6771 - (|6. II) and (At) give 



^2 Var " 



where a 2 = Var(e) and 



(i) f Sj-e 
61 



< ^E n [E?„]E 

f,d\2 Z_* 



A' 



(l) / g j — e 
61 



< 



Cha 2 ^l(Xi£ X Q )g v 



— 3 — M^r - 



u Z — 1 



■E 



.9, 



1 ™ 

3 E ^ 



J = l,i^i 



A"j — A, 



bo 



(A.12) 
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For the sum of conditional covariances in (lA.llj) . observe that by (A4) we have 



i=l 3 = 1 



EE Cov 

L 

71 ) 

EE E 

i=l 3=1 
n n 

EE 



(i) / e< 



(i) / e J 



Ej — e 



(1) / Ei 



Ei-e 



K 



h 

111/ £ j - £ 
61 



i=l 3=1 

3Vi 



(nb l ) 2 g m g jn 



EE^ 



fc=i £=1 



60 



^0 



X* — X., 



where 



£, k i = E k K{ 



(1) / Si 



Ei - e 



Moreover, under (A4), it is seen that for k ^ l 7 ^[£ki£ij] = when Card{i, j, k, £} > 3. Therefore the 
symmetry of K$ yields that 



EE Cov « 



i=i 3=1 



y k k 

*-*m-"- 1 



(1) / Ei 



Ei - e 



y ■ K y 



(1) / e J 



EH - € 



EE 

i=l 3=1 

iiti 



t(Xi G X )l(Xj G Xo) j^i ( Xj - X t \ w2 



{nb%) 2 g m g p 



b 



E 



(i) 



\p £ x )t(Xj g <y ) ^ 7^ ( Xk — x, 

(nb d )*g in g jr , ^ ° 



i=i 3=1 



k = l 
M*,3 



bo 



#0 



£ — e 



^ ^± \ Kl^W^ 



K 



(i) 



£ — e 



Therefore, since 



sup 

l<j'<n 



1 (Xj € Xp) 
\9jn\ 



o P (i) 



by Lemma HHl Lemma T6. 71 - (|6.1[) and (At) then give 



E E Cov « 



8=1 3=1 

3#< 



&1 



v. K-d) f fizi ^ v. k {1) ( £j ~ e 



6l 



= 0r ^)h — m — +0p{b ^h — m~ 



(A.13) 



where gi n is defined as in (|A.12[) and 



9in = , , d , 2 E E K ° ( 1 



Kn 



X k - X j 

bo 
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The order of the first term in (|A.13I) follows from Lemma 16. 11 which gives 



E 



Iff* 



O r {n). 



(A.14) 



Again, by Lemma 16.11 we have 



i=i l^ 1 " i=i 

with, using the changes of variables xi = X3 + &o z i> £2 = £3 + 60^2 

i=i 



E 



il\2 



E 



"0 



b 



< 



Cn 3 b, 



These bounds and the equality above, give under (A 2 ) and (A?) 



\K (zi)K {z 2 )\ g{x3 + b zi)g(x 3 + b Z2)g{x 3 )dz 1 dz 2 dx 3 . 



l{Xi € X )\ 9i . 

ill Iffinl 



Op(ti). 



Hence from (|A.14|) , (|A.13|) . (|A.12|) . (|A.11I) and Lemma RTTl we deduce, for bi small enough, 



Var„ 



(1) 



i=l 



61 N 


i=l 






b d o 


b d 



l(X j € Xo)gi: 

gin 



l(Xi G X )\g v 



i=l 



i=l 



6l 



Finally, this order, (jA.101) and the Tchebychev inequality give 

n 



i = l 



61 



1/2 



.□ 



Proof of Lemma 16.101 

Define /%„ as in Lemma [5751 and set 
1 " 



9i 



4 / -^2 — -^m 



ffi. 



2=1,3?^ 

The proof of the lemma is based on the following bound: 



1 



2 / -^2' — 



2 = 1>2#« 



1 (Xi G * ) (m m - m(X,)) A 



< C 



A/2 



(n^)( fe / 2 )^ 



ft G {4,6}. 



(A.15) 
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Indeed, taking successively k = 4 and k — 6 in (IA.15|) . we have, by (|A.8[> . Lemma ICTTl and (Ag) 



sup E n 

l<i<n 

sup E„ 

Ki<n 



1 (Xi G -*b) (m m - m(X0) 



!,fi\2 



1 



Op 6, 



which gives the results of the Lemma. Hence it remains to prove (|A.15|) . For this, define /3j„ and Sj., 
respectively as in Lemma 16.81 and Lemma 16.91 Since l(Xj G Ao) (mm — m(Xi)) — ft n + E m , and that 
depends only on (Xi, . . . , X n ), this gives, for k G {4, 6} 



1(X, G X Q ) (m in - m(Xi)Y 



<Cp? n +CE n [E*J 



(A.16) 



The order of the second term of bound (|A.16|) is computed by applying Theorem 2 in Whittle (1960) or the 
Marcinkiewicz-Zygmund inequality (see e.g Chow and Teicher, 2003, p. 386). These inequalities show that 
for linear form L = Y^j=i a jCj with independent mean-zero random variables £„, it holds that, for 

any k > 1, 



E|i fc | < C(k) 



E2 v 2/k |>fc| 
a j E Ki I 

3=1 



fe/2 



where C(fc) is a positive real depending only on k. Now, observe that for any i G [1, n], 



1 (X e *b) 



SjK 



Xi — Xj 



Since under (A4), the <Jji n 's, j G [1, n], are centered independent variables given Xi, . . . , X n , this yields, for 
any k G {4, 6}, 

fe/2 



E„ [S m ] < CE [ e * 



1 (Xi G * ) 



Xj - Xj 



< 



Ct (XieX )g. 



fe/2 
in 



K)(*/a)g* B 



Hence this bound and (|A. 161) give 



1(X G A- )(rn m -m(X)) A 



< C 



fc , l(XeA )3, 



fe/2 



(n6g)(*/2)g*, 



which proves (IA. 15|i . and then completes the proof of the lemma. 



Proof of Lemma 16.111 

Since Kq(-) has a compact support under (A-j), there is a C > such that ||Xj — Xj\\ > Cbo implies that for 
any integer number k of [l,n], Xp((Xfe — Xj)/&o) = if Ko((Xj — Xk)/bo) ^ 0. Let C [1, n] be such that 
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an integer number k of [1, n] is in Dj if and only if Ko((Xj — X k )/bo) 7^ 0. Abbreviate P(-|Xi, . . . , X n ) into 
P„ and assume that ||Xj — XA > Cbo so that Di and Dj have an empty intersection. Note also that taking 
C large enough ensures that i is not in Dj and j is not in Z)j. It then follows, under (A4) and since D t and 
Dj only depend upon Xi , . . . , X„ , 



("im — w(Xj),£j) € A and (mj„ — m ( Xj ) , £ j ) £ £> 



and I — — — 77-77 — ; ,, — ; , £j J £ D 



J2keD z \{i} (rnjXk) - + e k ) K ((X k - X t )/b ) 



£i e A 



x P„ 



P« ((m in - m(Xi),£i) G A) x P„ ((mj„ - m{Xj),ej) G B) . 



This gives the result of Lemma 16.111 since both (thin — m(Xi),£j) and (jhj n — m(Xj),Sj) are independent 
given X x ,...,X n , □ 



Proof of Lemma 16.121 

Since mj„ — m(Xj) depends only upon (Xi, . . . , X n , £&, k ^ i), we have 



X] Var„ (f in ) < £ E„ [Cf n ] - E » 



A" ) (m m - m(X)) E, 



(2) 



£» - e 



with, using Lemma 16. 71 (|6.2p . 



E,- 



(2) f 
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K (2) I j /(e)df; < 



Therefore these bounds and Lemma 16.101 give 

n n 

^Var„(Ci„) < C6!^E„ 



61 



1 (X G #„) (m m - m(X)) 4 



< CVt&i sup E„ 

Ki<n 



1 (Xi G Ab) (nWn - m(X)) 4 
2 
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which yields the desired result for the conditional variance. 



We now prepare to compute the order of the conditional covariance. To that aim, observe that Lemma 
16.111 gives 

n n n n , \ , 

]T C0V « On) = Yl £ 1 ( W Xi - < Cba ) ( E " KinOn] " E « [CtoJ E„ [&„] 



i=l i=i 



1 = 1 3 = 1 



The order of the term above is derived from the following equalities: 



Y 1 (\\ X i - X l\\ < Cb °) E » &»] E« [On] = P (n 2 6^i) (ft* + -If 



= 1 3=1 
3Vi 



n n / \ / h 

1=1 3 = 1 ^ ' V 



-)■ 



(A.17) 
(A.18) 



Indeed, since &i goes to under (Aw), (IA.17|) and (|A.18[) yield that 



££ C ° V n (Cm, Cjn) = Of 



i=l i = i 

3Vi 



(" 2 «>(^ + 4) 2+ (" 2 * ;,2 )( 6J+ i 



°'(" 2 « ,2 )(^ + i 



which gives the result for the conditional covariance. Hence, it remains to prove (IA.17[) and (|A.18|) . For 
(|A.L7|1 . note that by (A4) and Lemma l6~7l - (|6.2p . we have 



|E„ [C* 



< Cbi E. 



1 (Xi e Xo) (thin ~ m(Xi)) 2 Er, 



1 (Xi G Xo) (m m - m(Xi)) 4 



K 



(2) / Ej-e 



1 ' 6i 

1/2 



Hence from this bound and Lemma T6. 101 we deduce 



sup |E n [Ci„]E„[Cj„]| < Cb\ sup E„ 

l<i,j<n 1 < i < n 



Therefore, since the Markov inequality gives 

n n j 

££i(pr<-JOI|<C'&o) =0 P (n 2 6g) 

i = l 3 = 1 ^ 



1 (A, G A"o) (m m - m(A,)) 4 
1 ^ 2 



(A.19) 
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it then follows that 



£f>(pT< - Xj\\ < Cbo)K n [C m ]E„ [Q n ] = O p (n 2 b d bt) (b% + , 



8=1 3=1 



which proves (IA.17|) . 

For (|A.18|) . set Z in = 1 (Xj G X ) {fh in — m{Xi)) 2 , and note that for i ^ j, we have 



En [C'inC/n] E n 



(2) Ej 



e, - e 



&i 



E, 



(2) f £ »~e 
bi 



where 



E; 



Z jn K 1 



(2) 



Sj - e 
h 



— B 2 E- 



(2) / £z 



E,; - £ 



2/3^ n E m 



(2) Ei 



Ei - e 



&i 



S 2 ^(2) / gj-e 



(A.20) 



(A.21) 



The first term of Equality (| A.21|) is treated by using Lemma 16.71 (|6.2[) . This gives 

,(2) f £i - 



A' 



3«2 
jn- 



(A.22) 



Since under (A4), the Ej's are independent centered variables, and are independent of the Xj's, the second 
term in (|A.21|) gives 



E; 



(2) 



Sj - e 



1 (gj £ *o) Xq 



Afe — Aj 



nbftg 



An 



J" 



&0 



E, 



E„, 



(2) 



£i - e 



(2) / £» — e 



61 



Therefore, by (A7) which ensures that ATn is bounded, the equality above and Lemma T6. 71 - (|6.2p yield that 



(2) / 



£i - e 



&1 



< Cb\ 



1 (Aj g Xo) 



nbffij, 



(A.23) 



For the last term in (| A.21I) . we have 



E, 



(2) 



£i - e 



1 / X fc - Aj 

{nbtg jn ?hik °V &o 



Kc ,*t-Xj 



E, 



£fe££ATi 



(2) Ei 



£i - e 
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1 



/ Xfe — X, 



E, 



2^(2) / £^ 



37 



with, using Lemma |6. 7[ (|6.2[ ), 

E 



£kKl I b 1 



E; 



< max ^ sup 



i[c 2 ] SUp 



E, 



(2) 



£ — e 



Therefore 



E„ 



y2 k-P) f £ »~ e 



< Cftj \ "* K 2 ( X k - Xj 



6 



Substituting this bound, (|X23j) and (fA~22j) in (|AT2T1) . we obtain 



E; 



(2) / ~ e 



&1 



where 



M„ = sup 

l<j"<n 



/} 2 



1 (X, e Af ) 



nb$g jr 



1 



(nb%g jn ) 2 



E *3 



r2 ( Xk - X,j 



bo 



Hence from (|A.20[) . the Cauchy-Schwarz inequality, Lemma fa. 101 and Lemma r6.7K|6.2l) . we deduce 



n n / \ 

EE 1 ! ii^-^n <Cbo ) |En [CmC ^ ]| 

i = l 3=1 ^ ' 

3?" 

n n ✓ ^ 

< CM„6? £ ^ l (\\Xi - Xj || < Cb J E n 

i=i j=i ^ ' 



(2) ( Zj 



Ej — e 



n n 



i=l 3=1 



< cM n b\ E E 1 H x ' - x i\\ < Cb ° K /2 [ Z U 



jl/2 



K 



(2) / £i 



£i - e 



bi 



< M n b\O v (b% + ±) (6i) 1/2 E E f 1 (II - Xj\\ < Cb Q 



i=l 3=1 



Moreover, (IA.8I) and Lemma T6. II give, under (Ai), (Ay) and (Ag), 
Finally, substituting this order in the bound above, and using (|A. 19|i . we arrive at 

n n / \ / 1 

£I> ( Pfc " X/|| < C% JE„ [Ci»0«] - P (ti 2 6^I /2 ) ( 6* + 

j = l 3 = 1 ^ ' \ n 

This proves (|A. 18|i . and then completes the proof of the theorem. 
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